-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching: Add CACHE_VERSION
attribute to CalcJob
and Parser
#6328
Caching: Add CACHE_VERSION
attribute to CalcJob
and Parser
#6328
Conversation
@danielhollas @unkcpz I tried to implement something that would allow plugin developers to still control cache invalidation if their plugins change significantly, after we remove all version information from hash calculation. However, it is not as easy as it seemed. Essentially, we need to allow a plugin to define some class attribute that enters in the hash computation. This has to be stored on the node somehow though and cannot just be taken from the class when the node gets stored, because otherwise in the future, when the hash of the node is recalculated, it will take the current cache version of the plugin class, which may have been updated. Since it is important information, I didn't want to store this "cache version" to the extras, since it can get lost, since they are mutable. So the We have had talks in the past to have "internal" attributes that are set just by |
Yeah, I realized late last night the hash recalculation will make things tricky :-( |
It does. Because currently the aiida-core/src/aiida/orm/nodes/caching.py Line 67 in 1059b5f
which would be removed by #6215 Note that for calcjobs the version of both core and plugin is also stored in the attributes. But since that is fixed, that doesn't suffer from the same problem when rehashing. But #6215 also proposes to remove this from the hash calculation. |
Sorry, I don't follow, can you expand on this part? If those versions are stored (under which key?) why is adding an extra field problematic? |
The important part here is that the behavior is different for data and process nodes. The
This is fine for process nodes since those cannot be subclassed by plugins and so anyway they cannot control what is stored in the attributes. This is different for data plugins, though, where the attributes are essentially the one thing that plugins can control. As we have seen, the |
I see, that is subtle indeed. So if I understand correctly, the approach in this PR could work for Processes, because you could modify aiida-core to add it to ProcessNode attributes? Or does this have unintended side effects as well? |
Hmm, perhaps we could indeed only add it to |
The use case is clear to me now after a first go of reviewing. Just one comment.
I think this can be a very straightforward implementation, the hash is already stored in the extra, I didn't see why it is a problem to not store the cache version to extra as well.
Indeed, the direct purpose is to avoid the changing of Process input/output interfaces that will bring the false positive.
@danielhollas If I understand correctly, there will be no hash recalculation, if the plugin update the cache version, the new ProcessNode will just has different hashing and the old |
Yeah, that makes a lot of sense! I guess I don't even understand how caching works for Data node, or what even would be a purpose of that? I thought of caching as always tied to Processes?
Any specific reason for this to be a string? An |
8daadfc
to
9730f7a
Compare
Well, there's a difference in consequences when these extras are dropped. If the hash is dropped, the node should simply not be available for caching (right?) so you'll get a false negative for cache hit. But if the cache version somehow got dropped, then you could get false positive cache hits (but I guess only after the hash would also be recalculated?). These considerations do seem a bit academical. At the same time, adding it to Process attributes does not seem any more difficult in terms of implementation?
Sure, but it's possible that due to DB migration induced by AiIDA-core, we would ask users to recalculate their hashes manually (as we did last time, although whether that's necessary is up to discussion I guess). |
Data nodes are not really cached in that sense, they just have a hash which is used in the computation of the process' hash.
No particular reason. Could even accept either. For now I haven't really thought about limitations/validation of this class attribute. First wanted to see if the generic concept would be possible.
@unkcpz The problem here as Daniel has pointed out is that the public API (and CLI as well) have methods to clear the hash of the extras. And on top of that, the hash is just stored in the extras, which can be (accidentally) deleted. If at that point you have to recalculate the hash, you better make sure you have all the data that went in the original hash. So we need to permanently store this specific cache version (and cannot rely on extras as it can be accidentally removed). |
9730f7a
to
4df4753
Compare
I have updated the implementation to just change the behavior for |
src/aiida/plugins/utils.py
Outdated
@@ -92,5 +93,6 @@ def get_version_info(self, plugin: str | type) -> dict[t.Any, dict[t.Any, t.Any] | |||
return self._cache[plugin] | |||
|
|||
self._cache[plugin][KEY_VERSION_ROOT][KEY_VERSION_PLUGIN] = version_plugin | |||
self._cache[plugin][KEY_VERSION_ROOT][KEY_VERSION_CACHE] = getattr(plugin, 'CACHE_VERSION', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow the logic in this function, but should we try to only do this for Process subclasses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What other classes? The only other node subclasses are Data
and we just went through why that would cause issues while it not even being necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not being clear. If I understand this function, it is defined for anything that a plugin defines, e.g. both Data and Process classes. Since now the CACHE_VERSION is only on Process classes, this means that self._cache[plugin][KEY_VERSION_ROOT][KEY_VERSION_CACHE]
will always be None
for Data classes? Or am I totally off mark here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is that this PluginVersionProvider
is only used by the Process
class (through the Runner
) to add the result to its attributes. So Data
nodes are not affected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See this part of the code:
def _setup_version_info(self) -> None: |
I briefly went through the implementation and mostly makes sense to me. Now it got me thinking, should aiida-core use the same mechanism for invalidating caches, instead of forcing a DB migration? E.g. instead of putting a aiida version itself in the cache, have CACHE_VERSION defined somewhere (effectively globally). DB migration is a chore (I suspect), but the biggest issue is that it is not reversible. |
9fcafea
to
1388ae1
Compare
Very interesting idea. I think this might actually work and would indeed get rid of having to add a database migration to reset the cache in case the implementation changes. I don't see any immediate problems with it, but as always, the devil is in the details with caching. I think the best thing would be to simply try so I think I will give it a shot and see what comes up. |
Thought the idea over a bit. Although it should properly "invalidate" existing caches since the hash of future processes with identical inputs will now be different, due to the changed We might say that this theoretical problem is so unlikely that it doesn't really justify the inconvenience that migrations add. However, if we were to accept this solution, we would go back to the previous problem where we would have to store this global cache version somewhere on a node when it gets stored so later its hash can be recomputed. But with the current database layout, that could only be stored in the attributes, and that gives problems for the I am therefore leaning to just keeping database migrations to invalidate hashes if pinging @danielhollas |
Sorry, on a conference this week. Need to think this through a bit, I'll try to reply on Wednesday if I am able. By Friday the latest. |
Hm, I am not sure I fully follow this scenario. What exactly do you mean by "incorrect" in this case? It seems to me that the hash will be consistent with the CACHE_VERSION that it was created with, no?
This seems to describe a hash collision? Those should be vanishingly unlikely right? I mean you can get a collision regardless of this scheme by pure bad luck? But I might be misunderstanding what you're trying to say.
I thought that above we decided we don't really have to worry about the Data nodes, since they are defined by construction by their hash? I guess you're right that in case of aiida-core it might not be the same situation, if the core hashing algorithm is somehow changed.
I think for this PR let's go with that. We can always decide to rethink/improve on the aiida-core later I guess. |
I agree that hash collisions are always a risk, but since they are inherent to hashing algorithms and extremely unlikely, we shouldn't worry about them. However, what I was trying to sketch out, is that hashing collisions are to be expected and fine as long as your hashing algorithm doesn't change. But here, we were talking about the scenario where the hashing implementation in What I was thinking in my comment above is that hash collissions are acceptable as long as you stay within the same algorithm. However, when we change the objects to hash, we are changing the algorithm and then collisions risks no longer follow the rules of the hashing algorithm that lies underneath, right? Anyway, there is a more practical objection I think. Let's go back to why we want to introduce some mechanism to invalidate hashes. The reason we have been removing hash extras through database migrations is not to prevent false positives but rather to make sure that caching keeps working (we just didn't think that including the versions would fully undo this work 😅 ). If the hashing algorithm changes, the computed hash for a data node with attributes This is exactly the opposite motiviation for the feature we are discussing adding here, which is to give control to calcjob plugins to invalidate their hash if their implementation changes such that the outputs are expected to change even for identical inputs. Conclusion: I think we should keep using database migrations if |
That's a very good point. 👍 Agreed. I am not sure what is the state of the current PR, feel free to ping me for review. |
I think the current summary of the situation is as follows:
This is currently implemented by PR #6215 and this PR. So I think this is ready for review. |
I just thought of one more potential complication. Adding a cache version for the |
Yeah, unfortunate but reasonable. At some point I was also thinking about if we could somehow detect Parser/CalcJob changes automatically via some code introspection. But even if we were able to do that, it would kind of defeat the purpose since even non-functional changes would affect the caching behaviour. |
1297858
to
436662c
Compare
6335b91
to
2e46256
Compare
@danielhollas @unkcpz I have now implemented the final state of the discussion. It essentially adds the
Some open questions/action points:
|
+1 If we realize we need it to be more generic, we can always lift it up the class chain.
Seems ok to me.
I would vote to make it an
I am wondering if it would be clearer to have a separate attribute "cache_version" or something like that. That would make it easier to see that we do not hash anything inside |
2e46256
to
6ab6e60
Compare
Done. For now there is no validation though. I am not sure we need it.
Very good point, I like it. I have implemented it. |
6ab6e60
to
c16a73b
Compare
@danielhollas this should be done for final review. I have added the docs as well |
Node.CACHE_VERSION
to node attributesCACHE_VERSION
attribute to CalcJob
and Parser
c16a73b
to
a29554f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sphuber this looks great, thank you! Just some nits.
a29554f
to
e705186
Compare
Recently, the hashing mechanism was changed to remove all version information, both `aiida-core` and the plugin if applicable, from the hash computation. The reason was that with that information, each version change in core and plugin would essentially reset the cache. By removing the version information, updating the code would no longer invalidate all the cache. The downside of this change, however, is that when the implementation of a plugin, most notably `CalcJob` and `Parser` plugins, changes significantly such that a change in its attributes/inputs would really change its content, this would no longer be reflected in the hash, which would remain identical. This could lead to false positives when users update certain plugins. To give some control to plugin developers in case of changes where the hash would have to be effectively reset, the `CalcJob` and `Parser` classes now define the `CACHE_VERSION` class attribute. By default this is set to `None` but it can be set to an integer by a plugin developer. At this point, it is stored in the `cache_version` attribute under the `calc_job` or `parser` key, respectively. Since this attribute is included in the hash calculation, changing the value would effectively reset the cache for nodes generated with one of the plugins. The concept _could_ have been added to the `Node` base class to stay generic, however, this had complicates for `Data` nodes. Since the version data would have to be stored in the attributes, in order to prevent the risk from it being mutated or lost, it would interfere with the actual data of the node. For example, the `Dict` node is entirely defined by its attributes, so AiiDA cannot store any data in that namespace. Luckily, not having this cache version information in nodes other than `CalcJobNode`s is not a problematic, as the problem of inducing false positives by changing plugin code is really only relevant to `CalcJob` and `Parser` plugins.
2a1745e
to
0a6bc89
Compare
Recently, the hashing mechanism was changed to remove all version
information, both
aiida-core
and the plugin if applicable, from thehash computation. The reason was that with that information, each
version change in core and plugin would essentially reset the cache.
By removing the version information, updating the code would no longer
invalidate all the cache.
The downside of this change, however, is that when the implementation of
a plugin, most notably
CalcJob
andParser
plugins, changessignificantly such that a change in its attributes/inputs would really
change its content, this would no longer be reflected in the hash, which
would remain identical. This could lead to false positives when users
update certain plugins.
To give some control to plugin developers in case of changes where the
hash would have to be effectively reset, the
CalcJob
andParser
classes now define the
CACHE_VERSION
class attribute. By default thisis set to
None
but it can be set to an integer by a plugin developer.At this point, it is stored in the
cache_version
attribute under thecalc_job
orparser
key, respectively. Since this attribute isincluded in the hash calculation, changing the value would effectively
reset the cache for nodes generated with one of the plugins.
The concept could have been added to the
Node
base class to staygeneric, however, this had complicates for
Data
nodes. Since theversion data would have to be stored in the attributes, in order to
prevent the risk from it being mutated or lost, it would interfere with
the actual data of the node. For example, the
Dict
node is entirelydefined by its attributes, so AiiDA cannot store any data in that
namespace. Luckily, not having this cache version information in nodes
other than
CalcJobNode
s is not a problematic, as the problem ofinducing false positives by changing plugin code is really only relevant
to
CalcJob
andParser
plugins.Recently, the hashing mechanism was changed to remove all versioninformation, both
aiida-core
and the plugin if applicable, from thehash computation. The reason was that with that information, each
version change in core and plugin would essentially reset the cache.
By removing the version information, updating the code would no longer
invalidate all the cache.
The downside of this change, however, is that when the implementation ofa plugin, most notably a
CalcJob
plugin, changes significantly suchthat a change in its attributes/inputs would really change its content,
this would no longer be reflected in the hash, which would remain
identical. This could lead to false positives when users update certain
plugins.
To give some control to plugin developers in case of changes where the
hash would have to be effectively reset, the
Node
class now definesthe
CACHE_VERSION
class attribute. By default this is set toNone
but a
Data
plugin can change it to some string value. To facilitatethe same API for process plugins, such as
CalcJob
's, theCACHE_VERSION
attribute is also added to theProcess
base class.This value is automatically added to the node's attributes, under the
key
_aiida_cache_version
such that is is included in the computed hash.This way, a plugin developer can change this attribute in their plugin
to force the hash to change.
The version has to be stored in the attributes of the node and cannotjust be taken from the plugin's class attribute directly when computing
the hash, because in that case when the hash is recalculated in the
future, a potentially different value of
CACHE_VERSION
could be used(whichever version the plugin has that is installed at that time).